Minimizing Nonconvex Population Risk from Rough Empirical Risk

نویسندگان

  • Chi Jin
  • Lydia T. Liu
  • Rong Ge
  • Michael I. Jordan
چکیده

Population risk—the expectation of the loss over the sampling mechanism—is always of primary interest in machine learning. However, learning algorithms only have access to empirical risk, which is the average loss over training examples. Although the two risks are typically guaranteed to be pointwise close, for applications with nonconvex nonsmooth losses (such as modern deep networks), the effects of sampling can transform a well-behaved population risk into an empirical risk with a landscape that is problematic for optimization. The empirical risk can be nonsmooth, and it may have many additional local minima. This paper considers a general optimization framework which aims to find approximate local minima of a smooth nonconvex function F (population risk) given only access to the function value of another function f (empirical risk), which is pointwise close to F (i.e., ‖F − f‖∞ ≤ ν). We propose a simple algorithm based on stochastic gradient descent (SGD) on a smoothed version of f which is guaranteed to find an -second-order stationary point if ν ≤ O( /d), thus escaping all saddle points of F and all the additional local minima introduced by f . We also provide an almost matching lower bound showing that our SGD-based approach achieves the optimal trade-off between ν and , as well as the optimal dependence on problem dimension d, among all algorithms making a polynomial number of queries. As a concrete example, we show that our results can be directly used to give sample complexities for learning a ReLU unit, whose empirical risk is nonsmooth.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Minimizing loss probability bounds for portfolio selection

In this paper, we derive a portfolio optimization model by minimizing upper and lower bounds of loss probability. These bounds are obtained under a nonparametric assumption of underlying return distribution by modifying the so-called generalization error bounds for the support vector machine, which has been developed in the field of statistical learning. Based on the bounds, two fractional prog...

متن کامل

Risk Minimization in Stochastic Volatility Models: Model Risk and Empirical Performance

In this paper the performance of locally risk-minimizing delta hedge strategies for European options in stochastic volatility models is studied from an experimental as well as from an empirical perspective. These hedge strategies are derived for a large class of diffusion-type stochastic volatility models, and they are as easy to implement as usual delta hedges. Our simulation results on model ...

متن کامل

Risk-taking behavior in the presence of nonconvex asset dynamics.

The growing literature on poverty traps emphasizes the links between multiple equilibria and risk avoidance. However, multiple equilibria may also foster risk-taking behavior by some poor people. We illustrate this idea with a simple analytical model in which people with different wealth and ability endowments make investment and risky activity choices in the presence of known nonconvex asset d...

متن کامل

Dynamic Cross Hedging Effectiveness between Gold and Stock Market Based on Downside Risk Measures: Evidence from Iran Emerging Capital Market

This paper examines the hedging effectiveness of gold futures for the stock market in minimizing variance and downside risks, including value at risk and expected shortfall using data from the Iran emerging capital market during four different sub-periods from December 2008 to August 2018. We employ dynamic conditional correlation models including VARMA-BGARCH (DCC, ADCC, BEKK, and ABEKK) and c...

متن کامل

Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation

Discriminative training for machine translation has been well studied in the recent past. A limitation of the work to date is that it relies on the availability of high-quality in-domain bilingual text for supervised training. We present an unsupervised discriminative training framework to incorporate the usually plentiful target-language monolingual data by using a rough “reverse” translation ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018